Process Data

Extract data for 3 sites from each parquet file.

Reformat to wide dataframe.

Resample to X-minute.

Add features.

Min/Max scaler.

Save to /processed/

Imports and Setup

Extract data for sites

upstream site = 447_2_351

main site = 446_1_350

downstream site = 446_3_349

Fill in Gaps

Resample data

! warning. resampling is just a groupby wrt time. ideally we would create a new x-minute index and do a proper interpolation using scipy, but for now we'll just go with the resampling that pandas gives us.

Add features

from timestamp:

Min/Max Scaler

scale all values between -1 and 1. this way the sin and cos are untouched and we can use the same scaler for everything.

do it again for just lane_vehicle_speed, because later we'll need to unscale the predictions.

!! here!!

only do the minmax scaler on the Training data. Then apply it to the test data.

Export scaled data

Plot

Scratch